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We present an alternative algorithm to global fitting procedures to construct Par- 
ton Distribution Functions parametrizations. The proposed algorithm uses Self- 
Organizing Maps which at variance with the standard Neural Networks, are based 
on competitive-learning. Self-Organizing Maps generate a non-uniform projection 
from a high dimensional data space onto a low dimensional one (usually 1 or 2 
dimensions) by clustering similar PDF representations together. The SOMs are 
trained on progressively narrower selections of data samples. The selection cri- 
terion is that of convergence towards a neighborhood of the experimental data. 
All available data sets on deep inelastic scattering in the kinematical region of 
0.001 < x < 0.75, and 1 < Q 2 < 100 GeV 2 , with a cut on the final state invari- 
ant mass, W 2 > 10 GeV 2 were implemented. The proposed fitting procedure, at 
variance with standard neural network approaches, allows for an increased control 
of the systematic bias by enabling the user to directly control the data selection 
procedure at various stages of the process. 
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1. Introduction 

Parton Distribution Functions (PDFs) are defined as the probabilities to 
find a parton - a quark, antiquark or a gluon - of type a in the proton with 
a given value of the process' scale defined by Q 2 , the four-momentum trans- 
fer squared, and Bjorken's variable, xsj — Q 2 /2Mv, v being the energy 
transfer and M the proton mass. XBj represents the light-cone momen- 
tum fraction of the proton carried by the parton. Although PDFs were 
studied both theoretically and experimentally for the past few decades, 
their determination is still hampered by a number of unsolved questions 
mainly concerning their Perturbative QCD (PQCD) evolution and, related 
to this, the treatment of heavy flavor quarks. Furthermore, this situation 
in particular the large indetermination of the gluon distribution - will have 
practical critical consequences on the predictivity of results at the LHC. 
PDFs were, in fact, recently defined as "a necessary evil" Our work was 
indeed motivated by similar concerns as the ones expressed in 1 . 

To date, a few approaches have been developed that deal with the ques- 
tion of a fully quantitative determination of PDFs in a wide range of x Bj 
and Q 2 . On one side we have Global Fitting (GF) procedures, pursued, 
developed and refined since the beginning of QCD. a More recently, a num- 
ber of alternative approaches to GF were pursued, the main ones being the 
Neural Network (NN) approach 3 , and the Bayesian methods 4 . In both 
Rcfs. 3 ' 4 , the authors are concerned with the definition and evaluation of the 
PDFs uncertainties from GF. In particular, the x 2 obtained from the GF 
procedure is most likely to underestimate both the theoretical and experi- 
mental errors from the various data sets as proven by the existence of often 
large discrepancies in the results obtained by different groups 1 . In Rcf. 3 , in 
particular, the main source of indetermination is attributed to the theoret- 
ical bias introduced by the choice of parametrization form of PDFs at the 
initial scale, Q 2 , of PQCD evolution. However, implicit in NN algorithms 
is a hardly controllable systematic bias. The approach we propose here is 
based on a specific class of neural network algorithms, the Self-Organizing 
Maps (SOMs) (for a review see 5 ). SOMs allow for a better control of 
the systematic bias by allowing to replace the fully automated procedure 
of standard NNs with an interactive fitting procedure, at the expense of 
re-introducing some theoretical bias in the fit. Our fitting procedure is 
based on an iterative process in which the "user" interactively delineates 



a All results by the active groups in recent years are listed in 2 , and are also reported 
regularly at this conference. 
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the boundary between acceptable and unacceptable results. Observables 
are clustered into a SOM and judged by the "user". A statistical analysis 
of the corresponding initial-scale PDFs is performed and gives rise to the 
next iteration of PDFs. Several criteria can be chosen by the user: from 
the minimization of % 2 , to satisfying different sum rules, to selection on the 
behavior at low or large XBj, etc.... In this contribution we show results 
based on the criterion of minimization of \ 2 that allows us to gauge and 
test our initial results with the previously existing ones 2 . 

2. Method 

SOMs, at variance with standard NNs, are based on competitive-learning 
5 . In competitive learning one defines a number of "filters" that respond 
differently to the initial inputs in such a way that one or few of the filters 
are "winners" producing a high output. The "winners" create negative 
feedback so that only they and their neighbours get reinforced through 
the various cycles, or in other words, they get updated in learning. More 
technically, a SOM is an algorithm that maps in a topologically ordered 
way the training data onto a neural network. The mapping proceeds by 
selecting the neuron, Nw, that best matches each data sample according 
to a metric, Md. Each neuron is represented in a two-dimensional grid, 
with coordinates: Xi = (xi,X2). A weighted average of each neuron, Ni 
in the grid to the data sample is then performed, where the weight, W{ is 
computed from the distance of Ni to N\y according to a metric, Mq, and 
a given neighborhood radius. Mq defines the topology of the grid. This 
procedure is iterated with smaller radii until it saturates. 

For our specific problem, the neurons correspond to the PDFs; the data 
are "synthetic data" (randomized samples of the original data) . The metric 
Mq that defines the topology of the map is: 



An important aspect of our procedure is that PQCD evolution is considered 
at every step. Our preliminary results are displayed in Fig.l showing that 
our algorithm represents indeed a robust method to determine both the 
structure function F2(xBj,Q 2 ), and the gluon distribution, G(xBj,Q 2 ), 
evolved at Q 2 = 28.7 GcV 2 . 

We conclude that the proposed SOMPDFs, introudce a change of crite- 
ria with respect to NNPDFs aimed at bringing "theory" back in the loop, 
at variance with seeking full automation of the fitting procedure. They are 




(1) 



.3=1,2 
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F 2 vsXat Q 2 =28.7 
with 1-sigma error band 

5.5 




F 2 vsXatQ 2 =207.4 
with 1-sigma error band 




Figure 1. Left: Structure function F 2 {x,Q 2 ) from SOMPDFs fit, plotted vs. x in the 
range 10 -5 < x < 1, at Q 2 = 28.7 GeV 2 ; Right: F2(x,Q 2 ) in the same range of x, at 
Q 2 = 207 GeV 2 . 

therefore placed at the intersection between traditional GF methods and 
NN approaches. SOMPDFs have the following additional advantages over 
generic Genetic Algorithms that might help in future work to identify the 
role of different parameters: i) Visualization; ii) Dimensionality reduction; 
Hi) Clustering (a study is on its way to determine what features of PDFs 
produce given patterns of clustering). We hope as future practical goals, 
to extend our investigation to addtional "filters" other than the % 2 6 , an d 
to study the implementation of SOMPDFs in actual data analyses at the 
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LHC using both nucleon and nuclear data. 
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